Coarse-grained protein-protein stiffnesses and dynamics from all-atom simulations 



On 
O 
O 
(N 

C 

(N 



PQ 

d 



> 

00 

o 

(N 

cn 

o 

o> 
o 



X 



Stephen D. HicksB and C. L. HenlejQ 

Laboratory of Atomic and Solid State Physics, Cornell University, Ithaca, NY 14853-2501 

Large protein assemblies, such as virus capsids, may be coarse-grained as a set of rigid domains linked by 
generalized (rotational and stretching) harmonic springs. We present a method to obtain the elastic parameters 
and overdamped dynamics for these springs from all-atom molecular dynamics simulations of one pair of do- 
mains at a time. The computed relaxation times of this pair give a consistency check for the simulation, and 
(using a fluctuation-dissipation relationship) we find the corrective force needed to null systematic drifts. As a 
first application we predict the stiffness of an HIV capsid layer and the relaxation time for its breathing mode. 

PACS numbers: 87.10.Pq,87.15.ap,87.15.hg,87.15.Ya 



Large protein assemblies — our particular interest is the cap- 
sids (shells) of viruses — are pertinent to most of the soft- 
matter physics in cells; how can one calculate their elastic 
properties and corresponding dynamics? Such assemblies 
are too large to handle by all-atom simulations, but numer- 
ical coarse graining techniques are opening the door to di- 
rect simulations |1]. Nevertheless, we still prefer simpli- 
fied parametrizations for the purposes of human understand- 
ing, analytic treatment, transmission to other researchers, and 
building up coarse-grained models [2]. In this paper we pro- 
pose an approach to extract these simplified parameters from 
all-atom molecular dynamics (MD) of small subsystems. 

Assume we already know how to extract an appropriate 
subsystem and an intelligent way to project the 37V coordi- 
nates of its configurations onto s small number x. (Below, 
we do this explicitly for the case of the HIV capsid protein.) 
Our aim is, from the observed trajectories x(t), to extract the 
parameters for an effective Hamiltonian and equation of mo- 
tion. We will model x(t) as an overdamped random walk in a 
biased harmonic potential. This walk is parametrized primar- 
ily by two important tensors: one to describe the shape of the 
harmonic well, and the other to describe the (mainly hydro- 
dynamic) damping and the associated stochastic noise. Com- 
bining these tensors gives a matrix whose eigenvalues are the 
relaxation rates. With detailed measurement of the dynamics, 
we can identify whether the simulation is equilibrated during 
the simulation time, and can compute the external forces we 
must add so as to measure the behavior near the biologically 
proper configuration. This is similar in spirit to computing a 
potential of mean force or free energy landscape with Jarzyn- 
ski's equality [31], except that our coarse-grained x has more 
than one component, and (at minimum) represents angular 
degrees of freedom in addition to stretching. As an applica- 
tion, we simulate the important inter-domain interactions in 
the HIV capsid and estimate the Young's modulus and Pois- 
son ratio of the capsid lattice, as well as the relaxation rate of 
the breathing mode. 

Coarse graining as stochastic dynamics. We represent our 
system as a vector of generalized coordinates Xi,i= 1 . . . N, 
where N is far smaller than the number of atoms and is ob- 
tained by some form of coarse-graining. Our objective is to 
parametrize and determine from simulation (i) an effective 



free energy potential function U(x), and (ii) an equation of 
motion, for the coarse-grained coordinates. 

We assume the coarse-grained degrees of freedom are over- 
damped: this is true at time scales much longer than the "bal- 
listic scale" of local bond vibrations (tbai ~ lps). Then the 
dynamics is a continuous-time random walk: 
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Tf(x,t)+C(t), 
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where T is the (symmetric) mobility tensor, f(x,t) is the 
force, C(t) is a (Gaussian) stochastic function satisfying 



(CW®C(*'))=2D5(t-t'), 



(2) 



and D is the diffusion tensor. For detailed balance, D = 
k^TT at temperature T. We can expand the potential to sec- 
ond order about a point x*, 

U(x) = U Q - /* • (x - x*) + -(x - x*)K(x - x*), (3) 

where K is the (symmetric) stiffness tensor, then the force 
in dTJ is f(x) = /* — K(x — x*). From measuring coordinate 
covariances in the simulation, we obtain K: 
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[x - x*]) = k B TK- 



(4) 



If the static effective potential were our only interest, and if 
our runs were always long enough to equilibrate our system, 
there would have been no need to model the dynamics (Q]). 
As we do need the dynamics, we determine the diffusion ten- 
sor D (and hence T) by measuring the correlation function at 
short times between the ballistic and relaxation times scales 
(see below) during which the deterministic term in (Q~|) is less 
important than the noise: 
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([x(t') - x(t)] ® [x(f ) - x(t)]) _ W(f - t) 
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(5) 



We average W(Ai)/|At| over possible offsets At ^> tbal, 
inversely weighted by the expected variances oc (At) 3 . This 
weighting also ensures our estimate has negligible contribu- 
tion from t comparable to the relaxation times, at which times 
W(t) is no longer linear in t. Notice that since T pertains to 
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short-time dynamics, it is correctly measured even in runs too 
short to equilibrate in the potential well. 

If we transform into coordinates x = T~ /2 x then the equa- 
tion of motion becomes 



da; 



= rV* - R(2 - £*) +C(t), 



where 



(Ca(tK (t'))=2k B T6 aP 6(t-t'), 



(6) 
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and the relaxation matrix R = r' /2 Kr /2 (which has units 
[time] -1 ) is simply the stiffness tensor in our transformed 
frame. The eigenvalues of R are the decay rates r -1 for the 
relaxation normal modes a. 

The correlation time for a mode is the same as its relax- 
ation time, so the relative error in K for mode a is of order 
\/T a /T mn , where T mn is the total run time. Thus, if all the 
T a <C T lun , our estimate (|4|i of K is valid. But if t q ~ r mn for 
some direction, not only are errors large, but the initial devi- 
ation may still be relaxing over the entire run, which is often 
visible as a steady drift of the coordinates with mean velocity 
v. Averaging over time gives a large spurious variance in the 
drifting directions, leading to an underestimate of the corre- 
sponding stiffness. 

Application to HIV capsid. The elastic and dynamic proper- 
ties of viruses in general are of particular importance in un- 
derstanding the mechanisms by which they assemble and dis- 
assemble. The assembly must be reliable enough to produce 
capsids capable of surviving the harsh intercellular environ- 
ment, while still being able to disassemble upon entering a 
new host cell. HIV in particular is unique because of its char- 
acteristic conical capsids [4], whose mechanism of formation 
is yet unsettled. 

A capsid is well-modeled by a triangular lattice of proteins, 
and we coarse-grain at this level. We take rigid units to rep- 
resent either a whole protein or a sub-domain of a protein. 
Each unit therefore requires six coordinates for its position 
and orientation. Provided the actual interactions are pairwise 
between units, our program is to simulate only a pair of in- 
teracting units at a time, doing a separate simulation for each 
kind of unit-unit contact to obtain its parameters. The coarse- 
grained network is then reassembled and studied using these 
generalized springs. 

The HIV capsid protein (CA) consists of two globular do- 
mains: the larger 145-amino acid N-terminal domain (NTD) 
has a radius 1.3nm and the smaller 70-amino acid C-terminal 
domain (CTD) has a radius 1.7nm; we treat these as two sep- 
arate units. The NTD and CTD are connected covalently by a 
flexible linker; there is also an NTD-NTD interaction (which 
forms hexamers in the capsid structure), a CTD-CTD interac- 
tion (which forms symmetric dimers in the structure), and an 
NTD-CTD interaction between neighboring proteins around 
a hexamer. These four interactions are shown in FIG. Q2 a )- 
We believe the NTD-CTD interaction to be the weakest, and 
the known structure is also poorest, so we will ignore it from 



now on. We therefore simulate each other pair in isolation, 
using structures from the Protein Data Bank yfl. 

We carried out our simulations using a modified version 
of the NAMD 0] package with the CHARMM22 force field. 
Our proteins are in a periodic cell 5 to 9nm to a side using the 
TIP3P model for explicit water and 0.1M NaCl, run with 2fs 
timesteps for a total of 3ns each. We do most of the work in 
at constant pressure and temperature (NPT), using a Langevin 
piston barostat at P = latm, and a Langevin thermostat at 
T = 31 OK and damping rate 7l = 5ps _1 . The NPT sim- 
ulations model the statics well, but the thermostat's damping 
leads to unphysical dynamics with increased relaxation rates. 
This allows shorter simulations to equilibrate, but prevents us 
from determining the rates we should expect to see in reality. 
We therefore do a second measurement of diffusion at con- 
stant volume and energy (NVE). 

The center of mass and global rotation of the pair accounts 
for six trivial degrees of freedom; the remaining six represent 
the relative position and orientation of the two domains. Of 
these six, only one is a pure translation: the distance r = |ra — 
t*i| between the center of each domain. The orientation of 
domain m can be represented by a rotation matrix f2 m which 
rotates the domain from its reference orientation by an angle 
\9 m \ about the axis 9 m . The even and odd combinations 9\ ± 
#2 give six degrees of freedom that comprise the remaining 
five coordinates, along with an overall rotation due to the even 
combination about the inter-body axis r^ — rx. 

As we simulate just one pair of units from a protein com- 
plex, we omit the forces and torques on them due to the other 
units in the lattice, which generically had a nonzero resultant. 
In order to expand the free energy around the physiologically 
relevant configuration, we must add external forces to com- 
pensate; in light of (fl]i the correct force to impose is given by 
/„ = — r _1 tJ, where v is the drift velocity measured in the 
absence of the compensating force. This was not important 
for the pairs reported in our results. 

Results. The results for each simulation were similar, and 
the trajectory of the linker in the transformed relaxation mode 
coordinates is shown in FIG. \T\b), which is characteristic of 
all the observed trajectories. Once we have an equilibrated 
segment of a trajectory we use to determine the 6x6 stiff- 
ness tensor K; different components have different units, so 
it would be mathematically meaningless to diagonalize it di- 
rectly. Instead, we define reduced stiffness tensors, represent- 
ing the free energy cost if we optimize r for a fixed set of 
angles and vice versa. Given 
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then 
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K XT K r 



K rr - KrflKoiKp 
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"■orient — "•»Jr"rr "ro- (10) 

The eigenvalues of the reduced tensors are given in TABLEQ] 
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FIG. 1 : (a) Diagram of interactions in the HIV capsid lattice. The black and white shapes represent the dimer-forming CTD and the hexamer- 
forming NTD, respectively. Springs represent the three different bonds we are interested in, and dotted lines represent the fourth bond we are 
ignoring, (b) Relaxation mode trajectories of linker. The mode coordinate has units of y/as because it has been normalized by the noise. The 
slower modes are drawn with thicker lines. Note that the slowest mode has a very small drift, and we could correct this by applying an external 
force. The traces have been smoothed with a low-pass filter for readability. 
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K orilnt eigenvalues 




(fc B r/nm 2 ) 




(k B T) 


NTD-NTD 


12 


1300 


2800 4500 10000 18000 


CTD-CTD 


9.9 


210 


340 1100 3900 8300 


Linker 


2.8 


130 


250 480 1100 3800 



TABLE I: Effective stiffness eigenvalues for pair simulations: NTD 
dimer, CTD dimer, and the NTD-CTD linker within the CA protein. 



We computed the stiffness tensor implicitly in the rela- 
tive coordinates between the two bodies, but the absolute 
coordinates are the natural frame for computing the noise. 
Measuring the diffusion of a single body in an NVE simu- 
lation yields a mean -Dctd = 0.11rad 2 /ns and ^ntd = 
0.044rad 2 /ns. If we approximate each domain as a solid 
sphere then Stokes' law gives a rotational diffusion constant 
ZM rot ) = /s B T/(87T7/r 3 ) 17]. We thus expect I)£td = 
0.11rad 2 /ns and -D^td = 0.050rad 2 /ns using a viscosity 
V (3WK) _ .69cP. The accepted TIP3P viscosity t/ TIP3P ) = 
0.31cP gives poorer agreement. 

The translational diffusion constant is slightly harder to 
measure, since it is influenced significantly by the finite-size 
effect (Hi. This can be corrected for by measuring the dif- 
fusion at several box side lengths L and using a linear fit of 
£)( tr ) versus l/L to extrapolate to l/L — 0. Doing so yields 



D 



(tr) 
CTD 



= 55A z /ns and £>ntd = 27A 2 /ns. Stokes' law gives 



expected -Dctd = 56A /ns and -D^td = 43A / n s using 
^(TiP3P) _ q.31cP. The measured £H tr ) has a significantly 
larger relative error than D^' ot \ due to the finite-L extrapola- 
tion. 

We can diagonalize the relaxation matrix to compute the 
relaxation modes for each linkage. The NPT relaxation times 
from this calculation are listed in TABLE [II] All the times are 
significantly shorter than the simulation time, so we can be 





relaxation times r a (ps) 


NTD-NTD 
CTD-CTD 
Linker 


120 23 18 9.3 6.0 4.4 
76 26 24 7.8 5.4 4.1 
190 140 80 76 22 8.3 



TABLE II: NPT time constants for the relaxation modes of each pair. 



confident that the simulations are equilibrated. 

Finally, we can compose these generalized springs together 
into a triangular lattice as shown in FIG. [TJ a )> with an NTD 
hexamer at each vertex, a CTD dimer at the midpoint of each 
edge, and a spring connecting each domain, whose free en- 
ergy is given by the relative positions multiplied into the ap- 
propriate stiffness tensor. We can then determine the free en- 
ergy minimum as a function of periodic cell dimensions to 
find a lattice constant of a = 9.1nm. This is slightly smaller 
than the experimentally measured 10.7nm 14J], which may be 
largely due to our sheet being flat, rather than curved into a 
tube. Computing the free energy of simple extension yields a 
2d Young's modulus of 0.92fc B T/A 2 = 0.39N/m and a Pois- 
son ratio of 0.30. Assuming homogeneity and a thickness of 
5nm, we find a 3d Young's modulus of 77MPa (compared 
with 115MPa measured using atomic force microscopy ||9|]). 

Furthermore, we can estimate the relaxation rate of the full- 
capsid breathing mode in water by further coarse-graining to 
a single coordinate a representing a uniform dilation in the 
plane, which has dynamics given by (UJ with stiffness and mo- 
bility constants K a and r a . The projected stiffness is given 
by the bulk modulus K a = 4K 2 d — 2.6fc B T/A 2 , calculated 
from the 2d Young's modulus and Poisson ratio. To project 
the damping term, we observe that all the actual motion in the 
breathing mode of a virus capsid of radius r is in the radial 
direction, and we thus need to scale the capsid protein's trans- 
lational diffusion constant by (da/dr) 2 to find the diffusion 
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constant for a. Using the detailed balance condition, 



r a = 



167rV3J>gj D + 4 t j D 7 ? ( TIP3P ) 
N k B T 7?( 310K ) ' 



(11) 



where N — 16n^/3r 2 / a 2 is the total number of capsid pro- 
teins lfl"oh - Taking N = 1500 proteins as the average size for 
an HIV capsid thus gives a relaxation rate of 6.1ns _1 for the 
breathing mode. 

Discussion. In conclusion, we have put forth a model of 
overdamped random walks in which the statics and dynam- 
ics are described respectively by complementary "stiffness" 
and "mobility" tensors. From these two tensors a "relaxation 
matrix" can be formed, the eigenvalues of which give the re- 
laxation rates, which also provide a convergence test for sim- 
ulations. We demonstrated the usefulness of this model in ex- 
tracting coarse-grained elastic constants from molecular dy- 
namics trajectories of pairs of interacting domains. HIV is 
particularly well-suited for this because the important interac- 
tions appear to be nearest-neighbor, while many other viruses 
have long tails in which all six molecules in the hexamer are 
entwined, making it more difficult to separate into individual 
interactions. 

Evaluating the forces between protein domains to second 
order in the positions and orientations yields a picture of the 
dynamics that is simple enough both to simulate with all-atom 
MD as well as to model at the coarse-grained level, yet general 
enough to thoroughly describe the interaction in the vicinity of 
the simulated configuration. 

Our relaxation formalism bears some similarities to normal 
mode analysis, and in particular, Gaussian network models, 
which replace atomic interactions by springs of uniform stiff- 
ness llllll . While these techniques have been successful in ex- 
plaining reaction pathways such as virus maturation jl2lll3ll . 
they suffer from several shortcomings: first, while the nor- 
mal mode frequencies are useful in identifying soft degrees of 
freedom, the frequencies themselves are well known to be arti- 
ficial because they omit the damping forces of the surrounding 
water. For instance, the breathing mode we computed above 
would have a normal mode frequency of ^jK/m = 60ns -1 . 
Additionally, most applications are coarse-grained to the point 
that individual residue types are irrelevant: such a method is 
entirely insensitive to the effect of point mutations or of vary- 
ing the salinity. Lamm and Szabo 1 14] introduced so-called 
"Langevin modes," which are similar to our relaxation modes, 
but their method still suffers from the latter issues. 

Another quantitative approach to understanding protein 
dynamics is "essential dynamics" (or "principle component 
analysis") 01511 . This technique has the advantage that it 
is based on all-atom simulations, with the explicit damping 
forces and entropic contributions of the solvent, but the result- 
ing modes can only be expressed by giving a 3A-component 
vector. Hayward et al 111611 suggested specifying important 
modes a priori, and this provides us the great advantage being 
able to relate the results of several simulations together into 



a bigger picture. As long as our modes still contain the most 
important fluctuations, they are a reasonable basis to use. 

We have demonstrated the use of relaxational dynamics in 
extracting measurable elastic moduli from small molecular 
dynamics simulations. We hope that this technique will pro- 
vide a convenient middle ground between the atomistic and 
continuum pictures for other biological systems. 
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